The Labeled Segmentation of Printed Books

نویسندگان

  • Lara McConnaughey
  • Jennifer Dai
  • David Bamman
چکیده

We introduce the task of book structure labeling: segmenting and assigning a fixed category (such as TABLE OF CONTENTS, PREFACE, INDEX) to the document structure of printed books. We manually annotate the page-level structural categories for a large dataset totaling 294,816 pages in 1,055 books evenly sampled from 1750– 1922, and present empirical results comparing the performance of several classes of models. The best-performing model, a bidirectional LSTM with rich features, achieves an overall accuracy of 95.8 and a class-balanced macro F-score of 71.4.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

Persian Printed Document Analysis and Page Segmentation

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

The Comparative Effects of Using Electronic Short Story Books and Tradi-tional Printed Texts on EFL Learners’ Reading Comprehension

The purpose of this study was to investigate the comparative effect of using electronic short story books and traditional printed texts on EFL learners’ reading comprehension. For that purpose, ninety female learners ranging in age between fifteen and thirty five sat for the language proficiency test (PET, 2009) as the test of homogeneity and consequently sixty students were selected based on t...

متن کامل

ارزیابی سطح خوانایی کتاب‌های داستانی تألیفی برگزیدۀ شورای کتاب کودک

Purpose: This research aimed at the investigation of the readability level of 100 prominent authored fiction books for B, C, and D age groups, selected by Children's Book Council of Iran as an official institute for labeling and assigning level of children’s books in Iran. Methodology: Evaluative research method was used for the implementation of this research. Research population consisted ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017